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Executive Summary 

The 45th Weather Squadron (45 WS) includes the probability of lightning occurrence in their daily 
morning briefings. This forecast is important in the warm season months, May-September, when the area 
is most affected by lightning. This information is used by forecasters to assess the likelihood launch 
commit criteria and weather flight rules will be violated, and planning for daily ground operations on 
Kennedy Space Center and Cape Canaveral Air Force Station (CCAFS). The lightning probability 
forecast is based on the output from an objective lightning forecast tool developed by the Applied 
Meteorology Unit (AMU) that is supplemented by subjective analyses of model and observational data. 
This tool was developed over two phases. In Phase I, the AMU developed a set of equations that calculate 
the probability of lightning occurrence for the day that outperformed the previous operational tool by 
48%, and a graphical user interface (GUI) to input the parameter values and display the output. In Phase 
II, the equations were redeveloped with new data, and the GUI transitioned to the Meteorological 
Interactive Data Display System (MIDDS). The MIDDS GUI retrieves the required predictor values 
automatically, reducing work load on the forecasters. The Phase II equations outperformed Phase I by 
8%, for a total combined improvement in skill of 56%. 

The success of the previous work led the 45 WS to task the AMU with Phase III to improve the tool 
further. The period of record was increased from 17 to 20 years (1989-2008), and data for October were 
included. The main goal was to create equations based on the progression of the lightning probabilities in 
the daily climatology instead of creating an equation for each warm season month. Five distinct sub- 
seasons can be discerned from the daily lightning climatology. An equation for each of these sub-seasons 
would be created under the assumption that they would capture the physical attributes that contribute to 
thunderstorm formation more so than monthly equations. 

The data sources were the same as for Phase II and included the Cloud-to-Ground Lightning 
Surveillance System (CGLSS), 1200 UTC Florida synoptic soundings, and the 1000 UTC CCAFS 
sounding (XMR). Data from CGLSS were used to determine lightning occurrence for each day. The 1200 
UTC Florida and 1000 UTC XMR soundings were used to determine the flow regime and the 1000 UTC 
XMR soundings were used to calculate local stability parameters for each day. These datasets were 
processed and analyzed to create the predictand and candidate predictors needed for the statistical forecast 
equation development. The CGLSS data were used to create a binary predictand for lightning: aT for 
lightning occurrence during the day and a ‘O’ for non-occurrence. The flow regimes and stability 
parameters from the soundings were used to calculate the candidate predictors of lightning occurrence. 

The AMU stratified the data into two sub-sets: a development dataset containing 16 warm seasons 
from which the equations were developed, and verification dataset of 4 warm seasons on which the 
equations were tested. Before the equations could be developed, they had to be stratified again into sub- 
seasons. The sub-season start dates were not expected to be identical in every year, therefore the AMU 
developed and tested three methods to determine the start dates in each year. The ground-truth for testing 
came from a set of historical wet-season start dates determined by the National Weather Service in 
Melbourne, Fla. None of the three methods were able to determine the actual start dates in each year, 
therefore, the start dates were specified by the daily climatology and were the same in every year. 

The methods for developing and testing the equations were identical to those followed in Phase II. 
One logistic regression equation was developed for each sub-season, and the resulting five equations 
contained one to three predictors. The performance of these equations was compared to that of five other 
forecast method including the Phase II equations. The new equations outperformed every method except 
Phase II. Therefore, the Phase III equations will not replace the Phase II equations in operations. The 
reason for the degradation could be that the same sub-season start' dates were used in every year. It is 
likely there was overlap of sub-season days at the beginning and end of each defined sub-season in each 
individual year, which could affect the predictors chosen, their coefficients in the logistic regression and, 
ultimately, equation performance. Future work should include an effort to create an objective method that 
determines the start dates of the sub-seasons in each individual year. 
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1 Introduction 


The 45th Weather Squadron (45 WS) forecasters include a probability of lightning occurrence in their 
daily 24-Hour and Weekly Planning forecasts, which are briefed to the 45 WS staff in the morning at 
1100 UTC (0700 EDT) and released for customer use at 1130 UTC (0730 EDT). Forecasters at the 
Spaceflight Meteorology Group also make thunderstorm forecasts during shuttle operations. The 
probability of lightning occurrence is used by personnel in determining the possibility of violating launch 
commit criteria and shuttle weather flight rules, and planning for daily ground operation activities on 
Kennedy Space Center (KSC) and Cape Canaveral Air Force Station (CCAFS). This forecast is critically 
important in the warm season months, May-September, when the area is most affected by lightning. 

The lightning probability forecast is based on the output from an objective lightning forecast tool 
developed by the Applied Meteorology Unit (AMU; Bauman et al. 2004) that is supplemented by 
subjective analyses of model and observational data. This tool was developed over two phases. In Phase I, 
the AMU developed five equations, one for each warm season month, that calculate the probability of 
lightning occurrence for the day (Lambert and Wheeler 2005) and a Microsoft® Excel 0 graphical user 
interface (GUI) to display the output. The GUI allowed forecasters to interface with the equations by 
entering predictor values to output a probability of lightning occurrence. In Phase II (Lambert 2007), the 
equations were redeveloped by using two more years of data and modified predictors, and the GUI was 
transitioned to the Meteorological Interactive Data Display System (MIDDS). The Phase I equations 
outperformed several forecast methods used in operations and the Phase II equations, in turn, 
outperformed the Phase I equations. 

Based on the successes in the previous phases, the 45 WS tasked the AMU with Phase III to improve 
the tool further. Three warm seasons were added to increase the period of record (POR) to 20 years 
(1989-2008), and October data were included to capture the end of the lightning season. The main goal 
was to create equations based on the progression of the lightning season in the daily climatology instead 
of creating an equation for each warm season month. The assumption was that these equations would 
capture the physical attributes that contribute to thunderstorm formation more so than monthly equations. 

1.1 Phases I and II 

The Phase I objective lightning probability tool was a set of five logistic regression equations that 
calculated the probability of lightning occurrence for the day (Lambert and Wheeler 2005) in a 
rectangular area that encompassed all 5 NM lightning warning circles on KSC and CCAFS. They were 
developed using a 15-year (1989-2003) archive of Cloud-to-Ground Lightning Surveillance System 
(CGLSS) data, 1200 UTC Florida synoptic soundings, and the 1000 UTC CCAFS sounding (XMR). 
These equations outperformed the operational forecast methods used by the 45 WS. In particular, they 
outperformed the Neumann-Pfeffer Thunderstorm Index (NPTI) (Neumann 1971) by 48%. They also 
demonstrated good reliability, an ability to distinguish between non-lightning and lightning days, and 
improved standard categorical accuracy measures and skill scores over persistence. To facilitate user- 
friendly interaction with the equations, the AMU created a GUI using Microsoft Excel Visual Basic®. 
During GUI development, the 45 WS provided comments on the design to ensure it addressed their 
operational needs. Based on the test results, the GUI and equations were transitioned to operations for the 
2005 warm season and replaced the NPTI as the official lightning forecast tool. 

In Phase II, the AMU re-created the five logistic regression equations with five modifications: 

• Increased the POR from 15 to 17 years (1989-2005), 

• Modified the valid area to only include the 5 NM warning circle areas, 

• Included the XMR 1000 UTC sounding in determining the flow regime of the day, 

• Used a different smoothing function for the daily lightning climatology, and 

• Determined the optimal layer for the average relative humidity (RH) candidate predictor. 
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The Phase II equations outperformed the Phase I equations by 8%, and showed better performance 
than the Phase I equations in four other tests. As a result, the AMU replaced the Phase I equations in the 
GUI with the Phase II equations and transitioned it to operations. The Phase II equations are currently 
being used in operations. In addition to the Excel GUI, Mr. Paul Wahner of Computer Sciences Raytheon 
(CSR) created a MIDDS GUI to have the same look. This made it easier for forecasters to transition from 
the Excel to the MIDDS GUI. More importantly, the MIDDS GUI retrieved the required sounding 
parameter values automatically for the equations. To use the Excel GUI, the forecasters had to gather the 
sounding values and enter them in the GUI manually. This increased the risk of entering an incorrect 
value and calculating an erroneous probability. It also increased the time forecasters spent in preparing the 
daily and weekly forecasts. The MIDDS GUI reduces the possibility of human error and increases 
efficiency, allowing forecasters to do other duties. 

1.2 Phase III 

For Phase III, the 45 WS tasked the AMU to update the Phase II equations with three modifications. 
The first was to increase the POR to 20 years by adding the warm season data from the three years 2006- 
2008. The daily climatology for the Phase II POR in Figure 1 illustrates the driving factors for the other 
two modifications. The smoothed values were created with a Gaussian center-weighted function (Lambert 
2007). The 14-day smoothed values show the climatologies tapering off through the end of September, 
but not leveling out as can be seen at the beginning of May. Therefore, the second modification was to 
add October data to the POR to determine if a climatological end to the lightning season could be found 
in that month. 

The third, and primary, modification was to stratify the data by the progression of the lightning 
season instead of by month for the equation development. This progression can be seen best in the 14-day 
smoothed daily climatology values in Figure 1 . The values are low and the curve is flat in the first part of 
May. The curve increases from mid-May to mid- to late-June, where it then plateaus through mid- August. 
The lightning probability trend then decreases through September. The goal was to stratify the data into 
“sub seasons” at the inflection points where the trends changed instead of by month, and create an 
equation for each sub-season. Such stratification could capture the physical properties important to 
thunderstorm formation within each sub-season, possibly resulting in better-performing equations. 


Warm Season Daily Lightning Climatology 
1989-2005 



Figure 1. The Phase II daily raw (green), ±7-day smoothed (blue), and 
± 14-day smoothed (red) climatological probability values of lightning 
occurrence for the warm season months in 1989-2005. 
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2 Data 


The POR for the data used to develop the forecast equations was increased from 1 7 to 20 years by 
adding the data collected during the 2006-2008 warm seasons, and the October data for 1989-2008. The 
data sources include the 

• CGLSS, 

• 1200 UTC Jacksonville (JAX), Tampa (TBW), and Miami (MFL) Fla. soundings, and 

• 1 000 UTC XMR sounding. 

Data from CGLSS, the local network of cloud-to-ground lightning sensors, were used to determine 
lightning occurrence within a defined area (see Section 3.1) for each day. The 1000 UTC XMR and 1200 
UTC JAX, TBW, and MFL soundings were used to calculate the daily flow regimes, and the 1000 UTC 
XMR soundings were used to calculate the standard stability parameters that are readily available to the 
forecasters. Discussions for each data type used are included in this report for completeness, but only 
information pertaining to Phase III for brevity. More details on each data type can be found in the Phase I 
final report (Lambert and Wheeler, 2005). 

2.1 CGLSS 

The CGLSS is a network of six sensors Figure 2 that collects date/time, latitude/longitude, peak 
current, and polarity information of cloud-to-ground lightning strikes in the local area. Mr. Steve Madison 
of CSR provided the additional data for the 2006-2008 warm seasons and the October data. The CGLSS 
data were used to determine whether or not lightning occurred on each day in the POR. The primary 
purpose of the CGLSS data was to create the binary predictand for the equations. The data were also used 
to create the daily climatological lightning frequency and persistence forecasts that would be used as 
candidate predictors and forecast methods against which to test the new equations. 



Figure 2. The locations of the six CGLSS sensors are indicated by the red 
circles. The location names are next to the circles. 
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2.2 Florida 1200 UTC Soundings 

These data were collected to determine the daily flow regimes (Lericos et al. 2002, Lambert 2007). 
The AMU downloaded sounding data for the 2006-2008 warm seasons and October 1989-2008 from the 
Global Systems Division/Earth System Research Laboratory web site http://www.esrl.noaa.gov/raobs/ . As 
noted in Lericos et al. (2002), the current MFL and JAX sites were located at West Palm Beach, Fla. 
(PBI) and Waycross, Ga. (AYS), respectively, prior to 1995. The PBI and AYS data were used as proxies 
for MFL and JAX, respectively, during the period 1989-1994. All future references to MFL and JAX 
include the 1989-1994 data from AYS and PBI. The map in Figure 3 shows the locations of all the 
soundings. 

Use of the 1200 UTC sounding may seem inappropriate as it cannot provide data in time for the 
1 100 UTC briefing. The previous 0000 UTC sounding was ruled out because contamination by afternoon 
convective circulations could mask the larger scale flow pattern at this time. For the purpose of 
determining the flow regimes for each day in the POR, the 1200 UTC sounding provided the most 
reliable data. Due to the weak synoptic patterns during the Florida warm season, it is not likely that a flow 
regime change would take place in the two-hour period between 1000-1200 UTC. In an operational 
setting, the 45 WS can use several data sources, including model output and surface observations, to 
determine the flow regime of the day before the 1100 UTC briefing. Specific suggestions for data sources 
and procedures that can be used to determine the flow regime are discussed in Section 7.22 in the Phase II 
final report (Lambert 2007). 


2.3 XMR 1000 UTC Soundings 


The XMR sounding location is shown in Figure 3. The 45 WS forecasters use data from the 
1000 UTC sounding for the 1100 UTC morning briefing since it contains the most recent information on 
the state of the atmosphere over the area. They were used to supplement the Florida 1200 UTC soundings 
in determining the flow regime of the day and to calculate the sounding parameters normally available to 
the forecasters through MIDDS. The probability of lightning occurrence based on flow regime and the 
XMR sounding parameters were used as candidate predictors in the equation development. 
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Figure 3. The red dots on the map show the locations of all 
soundings used in this task. 



3 Data Preparation 

The AMU processed the three datasets described in Section 2 to create the equation elements needed 
for the statistical forecast equation development: the predictand and candidate predictors. The predictand 
is the element to be predicted from a predictor or group of predictors. There was one predictand value and 
one set of candidate predictor values per day in the POR. More details of how the data were processed to 
create these elements are given in the Phase II final report (Lambert 2007). All data were processed using 
the S-PLUS® statistical software package (Insightful Corporation 2007). 

3.1 CGLSS 

The CGLSS data provided the ground truth of whether or not lightning occurred within the 5 NM 
circles for which the 45 WS has forecasting responsibility (Figure 4) for each day in the POR. The data 
were filtered spatially to include only strikes that occurred within these circles, and temporally to include 
only lightning strikes recorded in the time period 0700-0000 EDT. 



Figure 4. The 5 NM lightning warning circles on 
KSC/CCAFS. The valid area is within the four blue 
(KSC) and six red (CCAFS) circles. 


The AMU used the filtered CGLSS data to create the predictand as well as the 1-day persistence and 
daily climatology candidate predictors. The value of the predictand was binary: ‘1’ if one or more strikes 
were detected within the defined time and space, ‘0’ if no lightning was detected. The 1-day persistence 
predictor was also binary: If lightning occurred on one day, the persistence value for the next day was ‘1\ 
If lightning did not occur, the persistence value was ‘O’. The predictand values were used to create the 
daily climatology. Figure 5 shows the raw, 7-day smoothed, and 14-day smoothed daily climatology 
curves. Details on how the values were calculated are in the Phase II final report (Lambert 2007). As in 
Phase II, the 14-day smoothed values were used for the daily climatology in the equation development. 
The new May-October 1989-2008 daily lightning climatology values in Figure 5 were consistent with the 
previous 1989-2005 climatology (Figure 1), being only ~1% lower on average. 
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One of the questions to be answered by this work was whether the October data should be included as 
part of the overall lightning season. Given that the daily values at the beginning of October are higher 
than those at the beginning of May, the 45 WS and AMU agreed that these data should be included. 


Warm Season Daily Lightning Climatology 
1989 - 2008 



Figure 5. The Phase III daily raw (green curve), ±7-day smoothed (blue curve), and 
±14-day smoothed (red curve) climatological probability values of lightning occurrence 
for the warm-season months including October in 1989-2008. 

3.2 Soundings 

The AMU used the Florida synoptic and XMR soundings to determine the flow regime of the day. 
The first step was to determine the synoptic flow regime of the day through a combination of the average 
1000-700 mb wind directions from the 1200 UTC MFL, TBW, and JAX soundings, as outlined in 
Lericos et al. (2002). Table 1 shows the criteria used to determine the daily flow regime. The 
mathematical procedure used to calculate the average wind direction in the 1000-700 mb layer is in 
Lambert and Wheeler (2005). The next step was to calculate the average 1000-700 mb wind directions in 
the 1000 UTC XMR soundings, which were used to determine the ‘local’ flow regime of the day 
(Lambert 2007). The local flow regime was used to determine the final flow regime of the day when the 
synoptic regime was Other, Missing, SE-1, or SW-2. In the SE-1 and SW-2 regimes, the ridge axis from 
the high over the Atlantic Ocean was just north or south of TBW, respectively. Exactly where the ridge 
was located relative to KSC/CCAFS was unknown. If the synoptic regime was SE-1 but the local regime 
showed southwest flow, SE-1 was replaced with SW-2, and vice versa. 

The flow regimes were used with the CGLSS predictand to calculate the probability of lightning 
occurrence for each flow regime in Table 1 within each lightning sub-season to be used as candidate 
predictors in the equation development. However, a method for determining the sub-season dates had to 
be developed first. The methods tested for stratifying the data by sub-season are discussed in Section 4. 
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Table 1. List of the flow regime names used in Phases I and II and the corresponding sectors showing 
the average 1 000 - 700 mb wind directions at each of the stations. 

Flow Regime Name and Description 

Rawinsonde Station 
MFL TBW JAX 

SW-l Subtropical ridge south of MFL 

Southwest flow over KSC/CCAFS 

1 80°-270° 

180°-270° 

1 80°-270° 

SW-2 Subtropical ridge north of MFL, south of TBW 
Southwest flow over KSC/CCAFS 

90°-180° 

180°-270° 

180°-270° 

SE-1 Subtropical ridge north of TBW, south of JAX 
Southeast flow over KSC/CCAFS 

90°-180° 

90°-180° 

180°-270° 

SE-2 Subtropical ridge north of JAX 

Southeast flow over KSC/CCAFS 

90°-180° 

90°-180° 

ko 

o 

0 

1 

oo 

o 

o 

NW Northwest flow over Florida, likely from a 

stronger-than-average subtropical ridge south of 
MFL extending into Gulf of Mexico 

270°-360° 

270°-360° 

270°-360° 

NE Northeast flow over Florida, likely from a 

stronger-than-average subtropical ridge north of 
JAX extending into southeast U.S., at times 
forming a closed high pressure center 

O 

O 

o\ 

o 

O 

0°-90° 

0°-90° 

Other When the layer-averaged wind directions at the 
three stations did not fit in defined flow regime 




Missing One or more soundings missing 





The 1000 UTC XMR soundings were also used to calculate the stability indices normally available to 
the forecasters through MIDDS. In order to calculate the same values that would be available to the 
forecasters, the AMU used the same equations as are used in the MIDDS code. All the routines that the 
AMU developed in Phase I to create the stability indices were used in Phase III. 

The stability index candidate predictors included the 

• Total Totals, 

• Cross Totals, 

• Vertical Totals (VT), 

• K-Index (KI), 

• Lifted Index (LI), 

• Thompson Index (TI), 

• Severe Weather Threat (SWEAT) Index, 

• Showalter Stability Index, 

• Temperature at 500 mb (T 50 o), 

• Mean RH in the 800-600 mb layer (Avg.86.RH), 

• Mean RH in the 825-525 mb layer (Avg.85.RH), 

• Precipitable water (PW), 

• Mean wind speed in the 1000-700 mb layer, and 

• The lapse rate between the 700 and 500 mb levels in °C/km (L57). 
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The formulas used to calculate the indices are standard and can be found in several sources (e.g. 
Peppier and Lamb 1989). Five indices in the above list are not readily available to the forecasters: VT, TI, 
Avg.86.RH, Avg.85.RH, and L57. The TI is calculated easily with the equation T1 = KI - LI, as is VT 
with Tgso - T 50 o. Avg.86.RH and Avg.85.RH were calculated using a weighted average described in the 
Phase II final report (Lambert 2007). L57 is the absolute value of (T 50 o - T 7 oo)/(Height 5 oo - Height 70 o) 

3.3 Data for Sub-Season Stratification 

Five distinct sub-seasons are evident from the 14-day smoothed curve in Figure 5 (dates are 
approximate): 

1) Pre-lightning 1-13 May, 

2) Ramp-up 14 May-22 June, 

3) Lightning 23 June-12 August, 

4) Ramp-down 13 August-12 October, and 

5) Post-lightning 13-31 October. 

The actual sub-seasons in different years likely start on different days. To stratify the data properly, the 
start dates of each sub-season in each individual year should be used. The method to choose the dates 
would have to be objective and repeatable so that forecasters could use the same procedure in real-time 
operations. The development and testing of three methods is discussed in Section 4. 

33. 1 Ground - Truth Dates 

Determining the accuracy of an objective method to choose sub-season start dates would be difficult 
without the aid of ground-truth dates. In 2002, the National Weather Service in Melbourne, Fla. (NWS 
MLB) conducted a study to determine the signatures for the start of the wet and dry seasons (Lascody 
2002). Through objective and subjective analysis of several surface and sounding variables, the study 
determined the start dates for Orlando in the years 1949-2002 and Daytona Beach in the years 1935- 
2002. The dates have been determined every year since the study ended, and the lists now contain start 
dates through 2009. 

Because of the extensive objective and subjective analysis done, the AMU determined the NWS MLB 
wet and dry season start dates would be used as ground-truth dates for the lightning and post season start 
dates, respectively, in each POR year. An objective method developed by the AMU would be considered 
a success if it was able to choose dates within seven days of these dates. The AMU chose the Daytona 
Beach dates due to that city’s close proximity to the Atlantic Ocean, similar to KSC/CCAFS. Table 2 
shows the NWS MLB dates in Month/Day order to show the distribution of start dates regardless of year. 
The last row in the table shows the median wet and dry season start dates for the Phase III 20-year POR. 
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Table 2. The NWS MLB wet and dry season start dates for Daytona Beach in the 
Phase III POR years 1989-2008. 

Wet Season Start 

Dry Season Start 


1991 


2003 

27 Sep 

2006 

16 Oct 

2002 


1997 

3 Jun 

1990 

30 Sep 

2001 

16 Oct 

2004 

20 May 

1995 

4 Jun 

1989 

6 Oct 

1992 

17 Oct 

1994 

21 May 

1996 

6 Jun 

2007 

7 Oct 

1991 

18 Oct 

1995 

27 May 

1992 

7 Jun 

2002 

9 Oct 

1996 

19 Oct 

1989 

27 May 

1999 

10 Jun 

2008 

9 Oct 

2000 

19 Oct 

1997 

29 May 

1994 

1 1 Jun 

2006 

12 Oct 

1993 

22 Oct 

1999 

30 May 

2001 

12 Jun 

1993 

14 Oct 

2005 

24 Oct 

1990 

30 May 

2005 

22 Jun 

2000 

14 Oct 

2008 

24 Oct 

1998 

1 Jun 

2004 

6 Jul 

1998 

15 Oct 

2003 

3 Nov 

2007 

Median: 2 June 

Median: 16 October 


3.3.2 Data for Objective Method 

Discussions with the NWS MLB and 45 WS forecasters as well as the results from the NWS MLB 
study indicated that PW might be a good variable to use in determining the start of the sub-seasons. Prior 
to the AMU calculating the stability indices from the XMR sounding, the 14th Weather Squadron 
(14 WS) calculated the daily means and standard deviation of PW over the 20-year POR for the 45-WS. 
To compare them with the smoothed daily climatology, the AMU applied the 14-day smoothing 
algorithm to the PW means and standard deviations. These values and the daily climatology from Figure 
5 are shown in Figure 6 along with the start dates of the wet and dry seasons from the NWS MLB study. 
The last dry season date, 3 November, is not shown in the chart since the x-axis does not go beyond 
31 October. 

The median wet season start date of 2 June is close to half way along the upward trend in daily 
climatologies, and the median dry season start date of 16 October is close to where the daily climatology 
decreases to the same values as at the beginning of the season in early May. The smoothed PW mean 
values show trends similar to the daily climatology. They peak a few days later in June than the daily 
climatology values. The plateau of PW means lasts into early September, but the daily climatology values 
begin to decline in mid-August. A “bump” in values exists in both curves during late September. The 
smoothed standard deviations are steady throughout the season until toward late September when they 
steadily increase. The similarities between the curves at the beginning of the warm season indicate that 
PW may be a good indicator of the beginning of the ramp-up and lightning sub-seasons, but it may not be 
helpful in determining the beginning of the ramp-down and post-lightning sub-seasons. 
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14-Day Smoothed Values 1989 - 2008 
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Figure 6. The 14-day Gaussian-smoothed daily climatology from Figure 5 (left axis, magenta line) and 
14 WS PW (right axis) means (solid blue line) and standard deviations (dashed blue line) with the NWS 
MLB wet season start (green circles) and end (red Xs) dates in the POR. The values for the start and end 
dates are plotted using the vertical axis on the right. A value of 0.1 means the wet season began/ended 
only once on that date in the POR, and 0.2 means it began/ended on that date twice in different years. 

After calculating all the parameters listed in Section 3.2, the AMU analyzed their 20-year means to 
determine which might be good indicators of sub-season beginning dates, and found that LI, KI, TI, PW, 
Avg.86.RH and Avg.85.RH showed the most promise. Figure 7 shows the daily 20-year mean values for 
these parameters. All but LI begin at relatively low values, increase through days 45-50 (14-19 June), 
plateau through days 145-150 (22-27 September), and then decrease through day 184 (31 October). The 
LI had opposite and much less pronounced trends. The increase in KI, TI, PW and the two RH values as 
well as the decrease in LI are consistent with the increase in daily climatology. While the trends at the 
beginning of the warm season closely match the KSC/CCAFS daily climatology, the decrease for all 
parameters is approximately a month later than for the daily climatology (Figure 6). 

The AMU then analyzed the standard deviations of the mean values in Figure 7, shown in Figure 8. 
At the beginning of the warm season, the standard deviations of most parameters were only slightly less 
than, if not equal to, their associated mean. This indicated too much variance in the parameters to make 
good indicators of lightning sub-season dates. Flowever, the PW standard deviations were 1/3 to 1/4 of 
their mean values. As the warm season progressed to the lightning sub-season, the standard deviations 
decreased, but increased again toward the end of the season. Even the PW values became larger toward 
the end of the warm season. Due to the relatively low standard deviations, PW is a better candidate for 
discerning the beginning of the ramp-up and lightning sub-seasons. However, the larger standard 
deviations toward the end of the warm season indicate PW might not be a good discriminator for the 
beginning of the ramp-down and post-lightning sub-seasons. 
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Warm Season Day Numbers 


Figure 7. The 20-year daily mean values of Avg.85.RH, Avg.86.RH, KI, LI, PW, and TI. The 
magnitude of the values is on the vertical axis, and the warm season day numbers, beginning with 
1=1 May and ending with 184=31 October, are along the horizontal axis. RH values are in percent and 
PW values are in mm. 


20-Year Standard Deviation Values of Sounding Parameters 
from the 1000 UTC CCAFS Sounding 



Warm Season Day Numbers 


Figure 8. As in Figure 7, but for the 20-year daily standard deviation values. 
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4 Sub-Season Start Dates 


Once all the data were prepared, the AMU began developing and testing methods to determine the 
beginning dates of the sub-seasons. Given the trends in the PW means and standard deviations in Figure 7 
and Figure 8, this was the main parameter used in the development. Also due to the closer association of 
the PW and daily climatology curves at the beginning of the season and the lower PW standard 
deviations, the AMU began by developing and testing methods to detennine the start of the ramp-up and 
lightning sub-seasons. If successful, the technique would be extended to determine the start of the ramp- 
down and post-lightning sub-seasons. Once the start dates could be determined, the flow regime lightning 
probabilities could be calculated for each sub-season. 

The AMU tested three statistical methods to determine which could choose the sub-season start dates: 

• Number of occurrences above a PW threshold value from the beginning of the warm season, 

• One-Sample t Test on the running PW mean, and 

• Multiple discriminant analysis (MDA). 

As stated in Section 3.3.1, the NWS MLB dates were used as ground truth in determining the ability of 
the techniques to identify the start of the lightning sub-season. To be successful, the technique had to 
choose a date within one week (± 7 days) before or after the NWS MLB date in each year. 

4.1 Chronological Check 

The first method was a simple chronological check of the number of occurrences of a threshold PW 
value. Since the daily PW values in any individual year can be highly variable from day to day, especially 
early in the season, the first occurrence of a threshold value is not likely a good indicator of the start of a 
sub-season. The AMU created an algorithm that began at Day 1 (1 May) and checked the daily PW values 
in chronological order. At the point where the daily lightning climatology values start to increase in mid- 
May, the average PW value is 1.2 in (30.5 mm). At the point where the values begin to plateau in June, it 
is 1.75 in (44.5 mm). 

The start of the ramp-up sub-season was defined reasonably well after the third occurrence of 
PW > 1.2 in. There is no equivalent ground truth for this date, only the daily lightning climatology. The 
average start-day for the ramp-up sub-season using this technique was 10 May, only three days earlier 
than the apparent start-day of 13 May from the daily climatology. The start days ranged from 3-19 May 
over the 20-year POR. 

The search for the start of the lightning sub-season began the day after the ramp-up start date. In some 
years with moist days at the beginning of the season, the ramp-up and lightning start dates were the same 
if both algorithms began at 1 May. Setting the number of occurrences to three and using a PW threshold 
of > 1 .75 in, the algorithm was only able to choose lightning sub-season start dates within one week of the 
NWS MLB start dates in 8 of the 20 years. A 40% success rate was not acceptable. In the other 12 years, 
dates were chosen that were 2-3 weeks before or after the NWS MLB wet season start date. The AMU 
varied the number of occurrences from two to five and the threshold from 1.7-1. 8 in, but no combination 
produced better results. 

4.2 One-Sample t Test 

The next method tested to determine the start date of the lightning sub-season was the one-sample t 
test. This test is used to determine if an observed sample mean was drawn from a population with a 
predetermined mean. The t value is given by the equation 

£ ~ Mo 
[Var(x)} 2 

Where in this case x is the running PW mean, // 0 is the predetermined PW threshold to define the start of 
the lightning sub-season, and Kar(x) is the sample estimate of the sample mean variance defined as 
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Var(x) = s 7 n , 

where s 2 is the sample variance (. s is the standard deviation) and n is the sample size (Wilks 2006). The 
null hypothesis, H 0 , is that x is drawn from a population whose mean is pio, and the alternative hypothesis 
is that the mean is not pt 0 . For a small value of t, the difference in the numerator is small compared to the 
variance term in the denominator. If it is more than twice the denominator, H 0 is likely to be rejected 
(Wilks 2006). S-PLUS has a function for the t Test that outputs the test statistic t and a parameter called 
the p value. The p value is the probability that t will occur. If p < the test level, in this case 5% or 0.05, 
then H 0 is rejected. 

The AMU began testing this method with p Q = 1 .75 in and n = 4 (days). The first day checked in each 
year was the day after the ramp-up start. The PW from each day plus the three days previous were used to 
calculate the running mean. If the p value indicated that the running mean was likely > ju 0 , the fourth day 
in the running mean was considered the start of the lightning sub-season. The results in comparison with 
the NWS MLB start dates were worse than the chronological check. Only 6 dates out of 20 were within 
one week of the NWS MLB dates. The AMU varied the number of days in the running mean (n) from 
three to eight, p 0 from 1.7 to 1.8 in, and the number of occurrences from one to three with similar or 
worse results. The other days not within one week of the ground-truth dates varied from two to three 
weeks before and after those dates, with no apparent pattern. 

4.3 Multiple Discriminant Analysis 

MDA is a statistical method used to discern between groups in a dataset. In this case, the AMU used 
it to discern between the ramp-up and lightning sub-seasons in each year, thereby determining the 
lightning sub-season start date. The steps in developing an MDA equation are given in Wilks (2006). The 
AMU used an equivalent function in S-PLUS to develop and test the MDA. 

4.3. 1 MDA Data and Function 

The AMU began by creating the dataset needed to develop the MDA function. This dataset contained 
the year, month, day, PW, KI, LI, and TI for all dates in the POR, and a group parameter that identified 
whether the day was in the ramp-up or lightning sub-season. The NWS MLB wet-season start dates were 
the beginning points for the lightning sub-season in each year, and the ramp-up start dates were those 
chosen by the chronological check described in Section 4.1. The end of the lightning sub-season also had 
to be chosen so the development data contained only days from the ramp-up and lightning sub-seasons. 
This was estimated from the daily climatology in Figure 5 and Figure 6 to be 15 August in every year. 
This was the date just before the downward trend in lightning frequency. Data from the odd years in the 
POR were used for the MDA development, and the resulting function was first tested on the development 
data. It is a good test of a predictive function to use the data from which it was developed. If it does not 
perform well with the development data, it will not perform well with other data. 

The MDA function has the form 


V = C 1 x 1 +C 2 x 2 ... +C n x n , 

where V is the value used to determine the group classification (ramp-up or lightning), C n are the constant 
coefficients for the data values, x n are the variable values, and n is the number of variables used in the 
MDA function. The coefficients are determined through complex matrix algebra using the variables and 
the group parameter in the development data set. The classification between two groups depends on 
whether V is greater or less than a dividing point value, determined by using the mean values of the 
variables in the development data set for Xj...x n in the equation above. An MDA function was developed 
using all the data in the development data set combined, then tested on each individual year in that data 
set. The S-PLUS function automatically assigned a group classification to each day. 


17 


4.3.2 MDA Development and Testing 

The AMU first used PW alone as the variable to develop the MDA (n = 1). This function performed 
poorly when tested with the development data, especially in years that were moist at the beginning of 
ramp-up or with dry spells in the ramp-up and lightning seasons. Ramp-up days were usually identified 
well at the start of that sub-season through mid-May, and lightning sub-season days were identified 
consistently usually starting in late June/early July. In the period between, the classifications for 
consecutive days became mixed such that no clear pattern or threshold point for the start of lightning 
season could be discerned. 

One parameter is usually not enough to forecast thunderstorm development with accuracy. The 
moisture represented by PW is critical, but instability is also needed. Therefore, the AMU added TI 
(n = 1,2) to the development, where TI = KI - LI: KI accounts for low-level moisture and lapse rate, and 
LI accounts for the low- to mid-level instability. The mean seasonal curve of TI had the same relation to 
lightning frequency as did PW (Figure 7). Even though the standard deviation of TI was high during May, 
it began decreasing in June and may be a good lightning sub-season discriminator when combined with 
PW. Indeed, when tested on the development data, the period of mixed classifications spanning the late 
ramp-up and early lightning sub-seasons decreased as compared to using only PW. Also, the number of 
successive days classified as lightning increased toward the end of the “mixed” period, interspersed with 
fewer and fewer successive days classified as ramp-up. 

Using a pattern of three days in a row with a lighting classification, the AMU was able to identify the 
beginning of the lightning sub-season within one week of the NWS MLB dates in 5 of the 10 odd years 
used to develop the MDA, and using four days in a row identified 4 of the 10 years. This was 
encouraging, but considering the MDA was tested on the data from which it was developed, this 
performance was not acceptable. Figure 9 shows the PW and TI values for the ramp-up (blue Xs) and 
lightning (red squares) sub-season days. Note the large overlap of the two sub-season groups highlighted 
by the red box between PW=20^10 and TI=0-35. For MDA to be effective, there must be a more clear 
separation of the groups. 


Ramp-Up and Lightning Sub-Season TI vs PW Values 



Figure 9. Scatter diagram of TI vs PW values for the ramp-up (blue Xs) and 
lightning (red boxes) sub-seasons for the odd years in 1989-2007. The red box 
surrounds the area of overlap between ramp-up and lightning sub-season days. 
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4.4 Daily Climatology 

While not successful, the MDA method showed promise as a sub-season discriminator and should be 
explored in future work. However, due to time constraints on the task, the AMU and 45 WS agreed to end 
testing methods that would determine the sub-season start dates in each individual year and, instead, 
define the dates using the daily lightning climatology. The black Xs in Figure 10 show the beginning of 
each sub-season: 

• Ramp-up begins 1 8 May when the climatological lightning frequency starts to increase; 

• Lightning begins 6 June, the point at which the rate of increase in the frequencies begins to 
decrease and just four days later than the median NWS MLB wet season start date; 

• Ramp-down begins 17 August when the large decrease in lightning frequency begins; and 

• Post begins 12 October when the rate of decrease lessens and becomes steady and the value 
reaches 0.13, the same as in the pre-lightning sub-season. 


Sub-Season Start Dates 



Figure 10. The 1989 - 2008 daily lightning climatology with the sub-season start dates indicated 
by black Xs. 

The lightning sub-season start date appears out of place on the curve in Figure 10. The AMU and 
Mr. Roeder of the 45 WS discussed several methods for choosing this date. The probability values begin 
to plateau at> 0.48 on 23 June, which was the original start date of choice for the lightning sub-season. 
However, this date was later than the 90th quartile of the NWS MLB wet season start dates. They decided 
to determine the point at which the increase in probabilities began to decrease and use that date as the 
start-date for the lightning sub-season. This occurred on 6 June, just four days after the NWS MLB mean 
wet season start date of 2 June. 
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4.5 Flow Regime Lightning Probabilities 

As stated earlier, the data had to be stratified into sub-seasons before the flow regime lightning 
probabilities could be calculated. The AMU stratified the data in each year into sub-seasons by the dates 
in Figure 10 and created flow regime lightning probabilities for each sub-season. A detailed description of 
the procedure to calculate these values is given in the Phase 11 final report (Lambert 2007). These 
probabilities were used as a candidate predictor in the equation development and are shown in Table 3. 
The values for the SW-l/SW-2 and for the SE-l/SE-2 regimes calculated separately were within 10% of 
each other. Therefore, the SW-l/SW-2 days and SE-l/SE-2 days were combined to increase the sample 
size and produce a more reliable probability value. 


Table 3. Sub-season probabilities of lightning occurrence in percent based on the 
flow regimes. The values in the far-right column are the sub-seasonal probabilities 
for all flow regimes combined. 

Sub-Season 

SW-l/2 

SE-1/2 

NW 

NE 

Other 

Monthly 

Pre-Lightning 

23 

13 

10 

4 

4 

13 

Ramp-Up 

42 

13 

28 

3 

22 

25 

Lightning 

67 

31 

51 

13 

42 

48 

Ramp-Down 

54 

32 

17 

14 

28 

32 

Post-Season 

28 

5 

1 

6 

' 5 

9 
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5 Equation Development and Testing 

There were three major steps in this portion of the task: 

• Ascertain data availability, 

• Develop the logistic regression equations, and 

• Determine the equation performance. 

The amount of data available for equation development was critical to the reliability of the new equations. 
After determining that an appropriate amount of data was available, a set of five equations was developed, 
one for each month in the warm season. The performance of the equations was assessed using several 
verification techniques appropriate for probability forecasts. 

5.1 Data Availability 

The amount of available data was determined before equation development began. This was 
important since the data had to be stratified into development and verification datasets followed by 
stratification into sub-season datasets, thereby limiting the amount of data available for equation 
development. To ensure the new equations would be reliable, ample data were required to create realistic 
relationships between the predictors and the predictand. The World Meteorological Organization (1992, 
hereafter WMO) states that there should be at least 250 events in the dataset in order to derive stable 
statistical relationships. This was the threshold in determining whether there were sufficient data. 

5. 1. 1 Missing Data 

There are 184 days in the warm season for this task, 1 May-31 October. This equates to 3680 days 
over the 20-year POR. Sounding data were not available every day. Data were considered missing for a 
specific day if one or more of the 1200 UTC Florida synoptic soundings (MFL, TBW, JAX) and the 1000 
UTC XMR sounding were missing to determine the flow regime, or when a 1000 UTC XMR sounding 
was missing to calculate the stability parameters. Table 4 shows the number of days in each sub-season, 
how many of those days had missing data, which dataset was missing, and the total number of days with 
available data. There were few cases in which data were missing from both datasets on the same day. The 
number in the third column under the heading “# MISSING DAYS” in Table 4 is less than the sum of the 
first two columns in every case because there were a few days in which data were missing from both 
datasets. The numbers of overlap cases are shown in parentheses in the third column. The last column in 
Table 4 shows that data availability ranged from 89-93% for each sub-season, and 90% overall. 
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Table 4. Summary of available data in the POR. The first column contains the names of the sub- 
seasons, where Total is for the entire season. The two columns under “# POSSIBLE DAYS” show the 
number of days in 1 and 20 seasons. The three columns under “# MISSING Days” show the number of 
unavailable days due to missing data from each dataset in the subheadings, and the number of days 
missing combined from both datasets. The value in parentheses in the third column is the number of 
days in which data were missing from both datasets. The final column shows the number of days with 
all data available. The percent of total possible days is given in parentheses. 

Sub-Seasons 

# Possible Days 

# Missing Days 

Total 
Available 
(% of# Possible) 

1 Year 

20 Years 

MFL 

TBW 

JAX 

XMR 

XMR 

Total 

(Overlap) 

Pre-Lightning 

17 

340 

6 

23 

26(3) 

314(92) 

Ramp-Up 

19 

380 

9 

23 

27 (5) 

353 (93) 

Lightning 

72 

1440 

35 

122 

143 (14) 

1297(90) 

Ramp-Down 

56 

1120 

15 

105 

115(5) 

1005(90) 

Post-Lightning 

20 

400 

7 

45 

46 (6) 

354 (89) 

Total 

184 

3680 

72 

318 

357 (33) 

3323 (90) 


5.1.2 Development and Verification Datasets 

The development dataset required enough samples so that the resulting set of equations was stable, 
i.e. the equations would maintain consistent forecast accuracy on different datasets. The verification 
dataset was needed for equation testing in order to have a more realistic view of how the equations would 
perform in operations. It was expected that the equations would not perform as well on the verification 
data as they would on the data from which they were developed. However, if performance were a great 
deal worse with the verification data, this would indicate that either too many predictors were chosen and 
the equations were fit too strongly to the development data, or the development dataset was too small. 

The candidate predictors and predictand for each sub-season were stratified into development and 
verification datasets. Care was taken to ensure there would be at least 250 events in the development 
dataset, while still having enough events in the verification dataset to make reasonable conclusions about 
equation performance. Of the 20 seasons in the POR, 16 were used for equation development and 4 were 
set aside for equation verification. 

The stratification did not involve choosing whole warm season years for each dataset, but rather 
individual warm season days. Days for the verification dataset were chosen first. Given that there are 184 
days in the warm season, the random number generator in Microsoft Excel was used to create four sets of 
184 numbers representing the years 1989 to 2008. The four sets of years were assigned to each day. Thus, 
each day in the warm season was represented by days from four random years. This ensured that each day 
was equally represented in the verification and development datasets. Care was taken to ensure the there 
were no duplicate years for each day from the random number generator. For example, the verification 
dataset contains 1 May 1996/2001/2004/2008, ..., 31 October 1997/2003/2006/2007. All other dates were 
made part of the development dataset. This random method ensured that any abnormal convective season 
would not skew the development of the equations or their verification. Table 5 shows the possible number 
of events for the development and verification datasets and the actual number after accounting for missing 
data. Note the number of days in the development dataset for each month in the right-most column. All 
are above the 250 event threshold defined by the WMO needed to develop reliable equations. 
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Table 5. Summary of available data. The first column contains the name of each sub-season, where 
Total is for the entire season. The three columns under “# POSSIBLE DAYS” show the number of days in 
20 warm seasons, the number of days for equation verification, and the number for development. The 
three columns under “# AVAILABLE Days”, show the number of days actually available in the POR 
due to missing data (Table 4), and the actual number of days for verification and development. 

Sub-Seasons 

# Possible Days 

# Available Days 

Total 

Verification 

Development 

Total 

Verification 

Development 

Pre-Lightning 

340 

68 

272 

314 

62 

252 

Ramp-Up 

380 

76 

304 

353 

66 

287 

Lightning 

1440 

228 

1212 

1297 

267 

1030 

Ramp-Down 

1120 

224 

896 

1005 

197 

808 

Post-Lightning 

400 

80 

320 

354 

73 

281 

Total 

3680 

676 

3004 

3323 

665 

2658 


5.2 Equation Development 

As in Phases I and II, logistic regression was used to create five equations, except for each sub-season 
in this case instead of each month. Predictor selection was conducted for each individual sub-season to 
account for the possibility that different variables may be more critical to convection formation as the 
season progresses. Detailed descriptions of logistic regression and the predictor selection procedure with 
supporting figures and equations are found in the Phase 11 final report (Lambert 2007). For the sake of 
brevity, these descriptions will not be repeated in this report since the procedures were followed exactly 
for this task. 

The AMU developed and tested several versions of each equation, each with varying numbers of 
predictors. The version that performed best on the verification data set was chosen as the final equation. 
Table 6 shows the predictors for each of the sub-season equations in rank order of their importance in 
predicting lightning. The predictor names are color-coded to highlight their occurrence in each equation. 
Blue identifies TI, which was chosen as the most important predictor in the first four sub-seasons. Red 
identifies the flow regime lightning probabilities. This parameter was the second predictor chosen in the 
ramp-up, lightning, and ramp-down sub-seasons, and the most important predictor in the post-lightning 
sub-season. Persistence is in green and was chosen as the third predictor for the ramp-up and lightning 
sub-seasons. VT and LI were used only once as the last predictors for the ramp-down and post-lightning 
sub-seasons, respectively. The first predictor in the first four equations, TI, accounts for instability and 
moisture in the profile, which are both necessary ingredients for thunderstorm formation. The flow regime 
probability accounts for the lifting mechanism, or lack thereof, from the low-level flow interacting with 
the sea breeze fronts, which occurs almost daily in the warm season. 


Table 6. The final predictors for each sub-season equation, in rank order of their importance in 
predicting lightning occurrence. Predictor names are colorized to highlight their occurrence in each 
equation. Vertical Totals and Lifted Index are in black font since they were only used once. 

Pre-Lightning 

Ramp-Up 

Lightning 

Ramp-Down 

Post-Lightning 

Thompson Index 

Thompson Index 
Flow Regime 
Persistence 

Thompson Index 
Flow Regime 
Persistence 

Thompson Index 
Flow Regime 
Vertical Totals 

Flow Regime 
Lifted Index 
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5.3 Equation Performance 

The predictors from the four-warm-season verification dataset were used in the Phase II and III 
equations to produce ‘forecast’ probabilities. Using the verification dataset provided an independent 
assessment of equation performance that could be used to conclude how the equations will perform in 
future operations. The forecast probabilities were compared with the binary lightning observations in the 
verification dataset using the Brier Skill Score, which is a measure of equation performance versus other 
forecast methods, in this case Phase III performance over Phase II. If Phase III outperformed Phase II, 
other tests to determine reliability and skill would be conducted. Otherwise, testing would cease and the 
Phase III equations would not be transitioned into operations. 

5.3.1 Phase II and III Forecast Probabilities 

The Phase II equation forecasts were used as a benchmark to determine if the Phase III equations 
improved the forecast. In Phase II, an equation was developed for each month, May-September. In order 
for the equations to perfonn as they currently do in operations and perform a fair comparison with the 
Phase III equations, the verification data were stratified by month for the probability calculations, 
excluding October. The Phase II flow regime probabilities were also calculated for each month and were 
different than those in Table 3. Therefore, the flow regime values from Phase II were used in the Phase II 
equations. Care was taken to make sure the flow regime probability values matched the correct flow 
regime day in the verification data. Once all the probabilities were calculated for each month, the values 
were appended to create a non-stratified full season dataset, and then re-stratified into the sub-seasons 
described in Section 4.4 for comparison with the Phase III probabilities. 

The Phase III probabilities were calculated for all sub-seasons, May - October. The performance of 
the post-lightning sub-season equation could not be compared to the Phase II equations since there was 
not an October equation from that work. The ramp-down season was compared, but only using data 
through the end of September even though the equation development included data through 1 1 October. 
This caused 39 days out of 197 (-20%) in the ramp-up verification dataset to be excluded from the 
comparison. 

5.3.2 Brier Skill Scores 

The Brier Skill Scores were calculated for each individual sub-season to show how each equation 
performed against four standard forecast methods and the Phase II equations. The number of available 
days in each month of the verification data ranged from 62-267 (Table 5). The pre-lightning, ramp-up, 
and post-lightning sub-season samples were small, but large enough to provide a reasonable estimate of 
relative skill with the Brier Skill Score. The five forecast methods were 

• 1-day persistence, 

• Daily climatology (Figure 10), 

• Flow regime probabilities (Table 3), 

• Sub-seasonal probabilities (percent of days lightning occurred), and 

• Phase II equation probabilities. 

The AMU began by first calculating the mean squared error (MSE) between the forecasts and 
observations for all six forecast methods using the equation 

MSE = -^(p i -O;) 2 (Wilks 2006), 
n m 


where n is the number of forecast/observation pairs, p; is the probability associated with the forecast 
method, and Oj is the corresponding binary lightning observation. The skill of the Phase III equations over 
the five forecast methods was calculated using the Brier Skill Score (SS) equation: 

f \ jf c T7 'N 


ss = 


MSE eqn -MSE ref 
MSE perfect — MSE ref 


*100 (Wilks 2006), 
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where MSE eqn was the MSE of the Phase III equations, MSE re f was the forecast method against which the 
equations were tested, and MSE per f ec t was the MSE of a perfect forecast, which is always 0. The SS 
represents a percent improvement or degradation in skill of the equation over the reference forecast when 
it is positive or negative, respectively. 

The SS values for each of the Phase III equations and a composite result for the entire warm season 
are shown in Table 7. The Phase III equations show a double-digit improvement in skill for most of the 
first four methods in the table, except for the 7% improvement over the flow regime probabilities in the 
pre-lightning and lightning sub-seasons. Of the first four methods, the smallest percent improvements 
were over the probabilities based on flow regime. The excellent performance of the Phase III equations 
over the first four methods did not foretell the dismal performance against the Phase II equations. In no 
sub-season did the Phase III equations outperform the Phase II equations, although the degradation in 
performance for the lightning sub-season was small if not insignificant. 


Table 7. The percent (%) improvement or degradation (red font) in skill of the Phase III over the 
Phase II equations and other standard forecast methods using the verification data. 

Forecast Method 

Pre-Ltg 

Ramp-Up 

Lightning 

Ramp-Dn 

Post-Ltg 

Season 

Persistence 

52 

48 

51 

47 

57 

50 

Daily Climatology 

17 

18 

25 

23 

23 

23 

Sub-Season Climatology 

18 

22 

25 

27 

21 

31 

Flow Regime 

7 

13 

7 

15 

18 

11 

Phase II Equations 

-12 

-12 

-0.6 

-4.1 

— 

-3.6 


The degradation in skill of the Phase III equations could have several causes. The development 
datasets for the pre-lightning and ramp-up seasons had just enough samples to meet the WMO criteria, 
but had fewer samples than the monthly datasets in Phase II, which ranged from 368^104 samples 
(Lambert 2007). More cases may result in better predictor selection and coefficient calculation for the 
logistic regression. However, the lightning and ramp-down sub-seasons had 1030 and 808 samples in 
their development datasets and the equations were still under-performers. The data were not stratified by 
sub-season start dates in each individual year, but the same start dates were used in every year. It’s highly 
likely that this wholesale method of choosing start dates resulted in some days being chosen for a 
particular sub-season that were actually part of the sub-season before or after. As a result, the predictor 
values would not all be representative of the sub-season. This could cause different predictors to be 
chosen and would certainly affect the value of the coefficients of the predictors in the equation. 

Regardless of the cause, the Phase III equations produced a degradation in skill and will not be 
transitioned to operations. This also dictated that no more testing of Phase III equation performance 
would be conducted. 
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6 Summary and Recommendations 

The AMU created new logistic regression equations in an effort to increase the skill of the Objective 
Lightning Forecast Tool developed in Phase II (Lambert 2007). One equation was created for each of five 
sub-seasons based on the daily lightning climatology instead of by month as was done in Phase II. The 
assumption was that these equations would capture the physical attributes that contribute to thunderstorm 
formation more so than monthly equations. However, the SS values in Section 5.3.2 showed that the 
Phase III equations had worse skill than the Phase II equations and, therefore, will not be transitioned into 
operations. The current Objective Lightning Forecast Tool developed in Phase II will continue to be used 
operationally in MIDDS. 

Three warm seasons were added to the Phase II dataset to increase the POR from 17 to 20 years 
(1989-2008), and data for October were included since the daily climatology showed lightning 
occurrence extending into that month. None of the three methods tested to determine the start of the sub- 
season in each individual year were able to discern the start dates with consistent accuracy. Therefore, the 
start dates were determined by the daily climatology shown in Figure 10 and were the same in every year. 

The procedures used to create the predictors and develop the equations were identical to those in 
Phase II. The equations were made up of one to three predictors. TI and the flow regime probabilities 
were the top predictors followed by 1-day persistence, then VT and LI. Each equation outperformed four 
other forecast methods by 7-57% using the verification dataset, but the new equations were outperformed 
by the Phase II equations in every sub-season. The reason for the degradation may be due to the fact that 
the same sub-season start dates were used in every year. It is likely there was overlap of sub-season days 
at the beginning and end of each defined sub-season in each individual year, which could very well affect 
equation performance. 

6.1 Predictor Comments 

The candidate predictors used to develop the equations were the same as those in Phase II with two 
additions: the mean speed in the 1000-700 mb layer and L57 (lapse rate between 700 and 500 mb). These 
two were added because the 45 WS forecasters found them to be important in forecasting thunderstorms 
and lightning over KSC/CCAFS. Also, the 850 and 500 mb wind speeds were important predictors in the 
previously-operational Neumann-Pfeffer Thunderstorm Index (Pfeffer 1967). Throughout the statistical 
predictor selection process, neither of these predictors was chosen as important to lightning occurrence, 
and indeed were among the least important predictors, mathematically speaking. This result was 
surprising to those that make the lightning forecast. They find that strong or weak flow will either pin 
convection to the coast, allow the sea breeze to penetrate inland slowly, or force the sea breeze to 
penetrate inland early before storms can start. This is critical for the placement of thunderstorms and, 
therefore, lightning. For the 2010 warm season, L57 proved to be an important predictor. PW was 
abundant, but the lack of instability did not support thunderstorm formation at KSC/CCAFS. 

The equations were developed using data from 1 6 warm seasons and the predictors chosen were those 
that were important for lightning occurrence most of the time, not just in special cases. It does not mean 
that wind speed or L57 are not important and should not be considered on any given day. The output from 
this tool can be considered a climatological probability and should be considered a first-guess in 
developing the daily lightning probability, not the final value. Forecasters should look at wind speed, L57, 
the other candidate predictors, other observations, and model data to determine the final lightning 
probability for the day. 
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6.2 Suggested Future Work 

Future work could include a task to create new monthly equations that include October since the daily 
climatology developed in this task showed it to contain significant lightning probabilities. Since lightning 
probability forecasts for May are provided in the current tool, probabilities for October should also be 
provided given that the values in early October were higher than those in early May. Also, in any 
individual October, the lightning season could extend past the early part of the month. This would be a 
relatively easy task since the procedures developed in the previous phases would be followed, the only 
difference being the addition of data from seasons that pass between now and the beginning of the work. 

Future work should include an effort to create an objective method that determines the sub-season 
start dates in each individual year. The inability to do that in this task is likely the reason for the degraded 
performance of the equations. MDA showed promise as a method to do this, and other statistical methods 
could be tested. While the forecasters could determine the current lightning sub-season subjectively, an 
objective technique to identify the sub-season is needed to develop and verify the sub-season equations. It 
would be time-consuming to identify the sub-season after it has begun subjectively since it requires 
analyzing lightning events across Florida and other weather data over several days in real-time. One way 
to objectively identify lightning sub-seasons would be to examine lightning data from across Florida and 
determine a threshold of number of flashes, perhaps over a number of consecutive days, or large change 
in flashes during a range of dates. Such a dataset would be very large and time-consuming to analyze. The 
time-consuming nature of conducting a subjective analysis to determine the sub-season start dates makes 
the development of an objective method the desired approach. 

Finally, any future work should continue to consider L57 and the 1000-700 mb layer wind speed as 
candidate predictors. As stated in Section 6.1, the forecasters consider these values to be very important 
predictors of lighting occurrence in the KSC/CCAFS area. The forecasters should be consulted at the 
beginning of the work for other candidate predictors not listed in this report so a thorough assessment of 
the importance of each can be made. Consultation with the forecasters prior to and during the 
development of the equations is critical to the success of the tool as they have the experience to know 
what is important in lightning forecasting. 
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List of Acronyms 


14 WS 

14th Weather Squadron 

45 WS 

45th Weather Squadron 

AMU 

Applied Meteorology Unit 

Avg.85.RH 

Mean RH in the 825-525 mb layer 

Avg.86.RH 

Mean RH in the 800-600 mb layer 

AYS 

Way cross, Ga. 3 -letter identifier 

CCAFS 

Cape Canaveral Air Force Station 

CGLSS 

Cloud-to-Ground Lightning 
Surveillance System 

CSR 

Computer Sciences Raytheon 

GUI 

Graphical User Interface 

JAX 

Jacksonville, FL 3-letter identifier 

KI 

K-Index 

KSC 

Kennedy Space Center 

L57 

Lapse rate between 700-500 mb 

LI 

Lifted Index 

MDA 

Multiple Discriminant Analysis 

MFL 

Miami, Fla. 3-letter identifier 

MIDDS 

Meteorological Interactive Data 
Display System 


MSE 

Mean Squared Error 

NE 

Northeast flow regime 

NPTI 

Neumann-Pfeffer Thunderstorm Index 

NW 

Northwest flow regime 

NWS MLB 

National Weather Service Melbourne, 
Fla. 

PBI 

West Palm Beach, Fla. 3-letter 
identifier 

POR 

Period of Record 

PW 

Precipitable Water 

RH 

Relative Humidity 

SE 

Southeast flow regime 

SS 

Skill Score 

SW 

Southwest flow regime 

SWEAT 

Severe Weather Threat Index 

Tsoo 

Temperature at 500 mb 

TBW 

Tampa, FL 3-letter identifier 

TI 

Thompson Index 

VT 

Vertical Totals 

WMO 

World Meteorological Organization 

XMR 

CCAFS rawinsonde 3-letter identifier 
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NOTICE 


Mention of a copyrighted, trademarked or proprietary product, service, or document does not constitute 
endorsement thereof by the author, ENSCO Inc., the AMU, the National Aeronautics and Space 
Administration, or the United States Government. Any such mention is solely for the purpose of fully 
informing the reader of the resources used to conduct the work reported herein. 
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